feat: add OpenAI diarization support#651
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds OpenAI speaker diarization support for transcription, enabling speaker-labeled segment output via the ChangesOpenAI Transcription Diarization Feature
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/ai-openai/src/adapters/transcription.ts`:
- Around line 267-285: The diarization validation is missing a local guard for
responseFormat: update validateDiarizationOptions (used by transcribe and
guarded by isDiarizeTranscriptionModel) to throw when
modelOptions.responseFormat (or the mapped value from mapResponseFormat) is not
one of the allowed values ["json","text","diarized_json"]; ensure transcribe()
cannot send srt/vtt/verbose_json for diarize models by checking
modelOptions.responseFormat (or resolved response format) early and throwing a
clear error stating diarization models only support json, text, and
diarized_json; reference validateDiarizationOptions, transcribe,
mapResponseFormat, and isDiarizeTranscriptionModel when applying the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7c4b4b31-fb90-4e00-9d8f-1454f513e089
📒 Files selected for processing (13)
.changeset/openai-transcription-diarization.mddocs/adapters/openai.mddocs/comparison/vercel-ai-sdk.mddocs/media/generation-hooks.mddocs/media/transcription.mddocs/reference/interfaces/TranscriptionOptions.mdpackages/ai-client/src/generation-types.tspackages/ai-openai/src/adapters/transcription.tspackages/ai-openai/src/audio/transcription-provider-options.tspackages/ai-openai/tests/transcription-adapter.test.tspackages/ai/skills/ai-core/media-generation/SKILL.mdpackages/ai/src/activities/generateTranscription/index.tspackages/ai/src/types.ts
|
Actionable comments posted: 0 |
05dfb53 to
fbb57a0
Compare
|
View your CI Pipeline Execution ↗ for commit 58aa20c
☁️ Nx Cloud last updated this comment at |
@tanstack/ai
@tanstack/ai-anthropic
@tanstack/ai-client
@tanstack/ai-code-mode
@tanstack/ai-code-mode-skills
@tanstack/ai-devtools-core
@tanstack/ai-elevenlabs
@tanstack/ai-event-client
@tanstack/ai-fal
@tanstack/ai-gemini
@tanstack/ai-grok
@tanstack/ai-groq
@tanstack/ai-isolate-cloudflare
@tanstack/ai-isolate-node
@tanstack/ai-isolate-quickjs
@tanstack/ai-mcp
@tanstack/ai-ollama
@tanstack/ai-openai
@tanstack/ai-openrouter
@tanstack/ai-preact
@tanstack/ai-react
@tanstack/ai-react-ui
@tanstack/ai-solid
@tanstack/ai-solid-ui
@tanstack/ai-svelte
@tanstack/ai-utils
@tanstack/ai-vue
@tanstack/ai-vue-ui
@tanstack/openai-base
@tanstack/preact-ai-devtools
@tanstack/react-ai-devtools
@tanstack/solid-ai-devtools
commit: |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/media/transcription.md`:
- Line 561: The example hardcodes 'whisper-1' in the createOpenaiTranscription
call; update the docs to use the provider's latest transcription model constant
exported from the OpenAI adapter's model-meta.ts instead of a string literal.
Import or reference the exported latest-model symbol from that file (e.g., the
adapter's LATEST_* or DEFAULT_* transcription model constant) and pass that
symbol into createOpenaiTranscription so the docs always use the adapter-defined
current OpenAI transcription model.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b2c455a0-25d6-4921-8f26-77965d2791be
📒 Files selected for processing (6)
.changeset/openai-transcription-diarization.mddocs/adapters/openai.mddocs/comparison/vercel-ai-sdk.mddocs/media/generation-hooks.mddocs/media/transcription.mddocs/reference/interfaces/TranscriptionOptions.md
✅ Files skipped from review due to trivial changes (5)
- .changeset/openai-transcription-diarization.md
- docs/media/generation-hooks.md
- docs/comparison/vercel-ai-sdk.md
- docs/adapters/openai.md
- docs/reference/interfaces/TranscriptionOptions.md
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/media/transcription.md`:
- Line 561: The example hardcodes 'whisper-1' in the createOpenaiTranscription
call; update the docs to use the provider's latest transcription model constant
exported from the OpenAI adapter's model-meta.ts instead of a string literal.
Import or reference the exported latest-model symbol from that file (e.g., the
adapter's LATEST_* or DEFAULT_* transcription model constant) and pass that
symbol into createOpenaiTranscription so the docs always use the adapter-defined
current OpenAI transcription model.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b2c455a0-25d6-4921-8f26-77965d2791be
📒 Files selected for processing (6)
.changeset/openai-transcription-diarization.mddocs/adapters/openai.mddocs/comparison/vercel-ai-sdk.mddocs/media/generation-hooks.mddocs/media/transcription.mddocs/reference/interfaces/TranscriptionOptions.md
✅ Files skipped from review due to trivial changes (5)
- .changeset/openai-transcription-diarization.md
- docs/media/generation-hooks.md
- docs/comparison/vercel-ai-sdk.md
- docs/adapters/openai.md
- docs/reference/interfaces/TranscriptionOptions.md
🛑 Comments failed to post (1)
docs/media/transcription.md (1)
561-561:
⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse the provider’s latest OpenAI transcription model in this example.
This changed snippet still hardcodes
whisper-1; please update it to the latest OpenAI transcription model defined in the adaptermodel-meta.tsto keep docs aligned with project policy.As per coding guidelines: “Use the latest model per provider in documentation example code, sourced from each adapter's
model-meta.ts(newestgpt-*,claude-*,gemini-*, …)”.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/media/transcription.md` at line 561, The example hardcodes 'whisper-1' in the createOpenaiTranscription call; update the docs to use the provider's latest transcription model constant exported from the OpenAI adapter's model-meta.ts instead of a string literal. Import or reference the exported latest-model symbol from that file (e.g., the adapter's LATEST_* or DEFAULT_* transcription model constant) and pass that symbol into createOpenaiTranscription so the docs always use the adapter-defined current OpenAI transcription model.
|
Hi @8times4, thank you for this. Would you be able to create an e2e test for this using aimock? The tests are in the e2e test package. Ideally, adding a way to see the results on one of the ts-react-chat example pages would be great as well |
Code reviewFound 3 issues:
ai/packages/ai-openai/src/adapters/transcription.ts Lines 140 to 150 in 05dfb53
Lines 1723 to 1732 in 05dfb53
ai/packages/ai-openai/src/adapters/transcription.ts Lines 339 to 370 in 05dfb53 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
- Removed `transcribe-provider-options.ts` file and integrated its options into `transcription-provider-options.ts`. - Updated documentation to reflect changes in response formats, emphasizing the use of `modelOptions.response_format` for diarization. - Enhanced the transcription adapter to handle new model options and response formats, including support for speaker diarization. - Adjusted various components and tests to accommodate the new structure and ensure compatibility with the updated transcription features.
|
Thanks for the review @tombeckenham, should be fixed now. |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
packages/ai/skills/ai-core/media-generation/SKILL.md (1)
284-286: ⚡ Quick winClarify the diarization contract.
chunking_strategy: 'auto'reads like an optional default here, but the adapter enforces that setting forgpt-4o-transcribe-diarize. Please phrase it as required behavior, not a caller-tunable default.♻️ Suggested wording
-For speaker diarization, use openaiTranscription('gpt-4o-transcribe-diarize'). -It defaults to modelOptions.response_format: 'diarized_json' and chunking_strategy: 'auto'; +For speaker diarization, use openaiTranscription('gpt-4o-transcribe-diarize'). +The adapter enforces modelOptions.response_format: 'diarized_json' and chunking_strategy: 'auto'; do not pass prompt, include, or timestamp_granularities with this model.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/ai/skills/ai-core/media-generation/SKILL.md` around lines 284 - 286, The documentation wording implies chunking_strategy: 'auto' is an optional default, but the adapter enforces that value for openaiTranscription('gpt-4o-transcribe-diarize'); update the sentence in SKILL.md to state that chunking_strategy must be 'auto' (required behavior) rather than a caller-tunable default and keep the note that modelOptions.response_format defaults to 'diarized_json' and callers must not pass prompt, include, or timestamp_granularities when using openaiTranscription('gpt-4o-transcribe-diarize').
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/ts-react-chat/src/lib/server-fns.ts`:
- Around line 84-86: The Zod enum used for transcription response formats
(TRANSCRIPTION_RESPONSE_FORMAT_SCHEMA) is missing 'diarized_json', causing
callers to be rejected before reaching generateTranscription; update the enum
used by both entrypoints to include 'diarized_json' (i.e., add the
'diarized_json' string value to TRANSCRIPTION_RESPONSE_FORMAT_SCHEMA and the
corresponding response-format validator used by the other entrypoint) so
diarized transcription requests validate successfully.
In `@packages/ai/src/types.ts`:
- Around line 1712-1717: The TranscriptionResponseFormat union is missing the
new 'diarized_json' member, causing type errors when callers set
TranscriptionOptions.responseFormat to that value; update the
TranscriptionResponseFormat type to include 'diarized_json' so the shared
contract matches the OpenAI provider, and ensure any other identical union (the
duplicate around the TranscriptionOptions declaration) is updated as well so
both TranscriptionResponseFormat and any repeated type declarations accept
'diarized_json'.
In `@testing/e2e/src/components/TranscriptionUI.tsx`:
- Around line 35-49: The default diarization E2E payload currently hardcodes
chunking_strategy ('chunking_strategy: "auto"') inside transcriptionInput ->
modelOptions when isDiarization is true; remove the chunking_strategy field from
transcriptionInput so the test covers the omitted-field/defaulting branch (leave
known_speaker_names and known_speaker_references as-is), and if you still want
explicit-option coverage add a separate test that constructs a
transcriptionInput with modelOptions.chunking_strategy = 'auto' to exercise the
passthrough path; update references to isDiarization, transcriptionInput, and
modelOptions accordingly.
In `@testing/e2e/src/lib/media-providers.ts`:
- Around line 99-108: The factory currently selects openaiTranscriptionModel
based on the optional feature param (in the openaiTranscriptionModel variable)
which can mismatch the actual transcription options; change
createOpenaiTranscription usage in the factories to derive the model from the
provided transcription options (e.g., inspect responseFormat and modelOptions
for diarization flags such as responseFormat === 'diarized_json' or
modelOptions.diarize) instead of relying on feature, or validate and reject when
diarization-specific options are present while feature !==
'transcription-diarization'; update the logic around openaiTranscriptionModel
and createOpenaiTranscription so diarization requests choose
'gpt-4o-transcribe-diarize' or fail fast.
---
Nitpick comments:
In `@packages/ai/skills/ai-core/media-generation/SKILL.md`:
- Around line 284-286: The documentation wording implies chunking_strategy:
'auto' is an optional default, but the adapter enforces that value for
openaiTranscription('gpt-4o-transcribe-diarize'); update the sentence in
SKILL.md to state that chunking_strategy must be 'auto' (required behavior)
rather than a caller-tunable default and keep the note that
modelOptions.response_format defaults to 'diarized_json' and callers must not
pass prompt, include, or timestamp_granularities when using
openaiTranscription('gpt-4o-transcribe-diarize').
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3121ef25-add2-4697-a74d-2dbfb49daa47
📒 Files selected for processing (30)
docs/adapters/openai.mddocs/comparison/vercel-ai-sdk.mddocs/media/generation-hooks.mddocs/media/transcription.mdexamples/ts-react-chat/src/lib/audio-providers.tsexamples/ts-react-chat/src/lib/server-audio-adapters.tsexamples/ts-react-chat/src/lib/server-fns.tsexamples/ts-react-chat/src/routes/api.transcribe.tsexamples/ts-react-chat/src/routes/generations.transcription.tsxknip.jsonpackages/ai-client/src/generation-types.tspackages/ai-openai/src/adapters/transcription.tspackages/ai-openai/src/audio/transcribe-provider-options.tspackages/ai-openai/src/audio/transcription-provider-options.tspackages/ai-openai/tests/transcription-adapter.test.tspackages/ai/skills/ai-core/media-generation/SKILL.mdpackages/ai/src/activities/generateTranscription/index.tspackages/ai/src/types.tstesting/e2e/fixtures/transcription/basic.jsontesting/e2e/fixtures/transcription/diarization.jsontesting/e2e/src/components/TranscriptionUI.tsxtesting/e2e/src/lib/feature-support.tstesting/e2e/src/lib/features.tstesting/e2e/src/lib/media-providers.tstesting/e2e/src/lib/server-functions.tstesting/e2e/src/lib/types.tstesting/e2e/src/routes/$provider/$feature.tsxtesting/e2e/src/routes/api.transcription.stream.tstesting/e2e/src/routes/api.transcription.tstesting/e2e/tests/transcription.spec.ts
💤 Files with no reviewable changes (2)
- knip.json
- packages/ai-openai/src/audio/transcribe-provider-options.ts
✅ Files skipped from review due to trivial changes (5)
- testing/e2e/src/lib/types.ts
- docs/comparison/vercel-ai-sdk.md
- testing/e2e/fixtures/transcription/diarization.json
- docs/media/generation-hooks.md
- docs/media/transcription.md
🚧 Files skipped from review as they are similar to previous changes (5)
- packages/ai/src/activities/generateTranscription/index.ts
- packages/ai-client/src/generation-types.ts
- docs/adapters/openai.md
- packages/ai-openai/tests/transcription-adapter.test.ts
- packages/ai-openai/src/adapters/transcription.ts
| const isDiarization = feature === 'transcription-diarization' | ||
| const transcriptionInput: TranscriptionGenerateInput = { | ||
| audio: TEST_AUDIO_BASE64, | ||
| language: 'en', | ||
| ...(isDiarization | ||
| ? { | ||
| modelOptions: { | ||
| response_format: 'diarized_json', | ||
| chunking_strategy: 'auto', | ||
| known_speaker_names: ['agent', 'customer'], | ||
| known_speaker_references: [TEST_AUDIO_BASE64, TEST_AUDIO_BASE64], | ||
| }, | ||
| } | ||
| : {}), | ||
| } |
There was a problem hiding this comment.
Leave chunking_strategy out of the default diarization E2E payload.
Line 43 hardcodes chunking_strategy: 'auto', so the new Playwright flow only proves the explicit-option path. The omitted-field/defaulting branch can regress without any of the three diarization E2E modes failing. I’d make the default test payload minimal and add a separate explicit-option case only if you still want passthrough coverage.
Suggested change
const transcriptionInput: TranscriptionGenerateInput = {
audio: TEST_AUDIO_BASE64,
language: 'en',
...(isDiarization
? {
modelOptions: {
response_format: 'diarized_json',
- chunking_strategy: 'auto',
known_speaker_names: ['agent', 'customer'],
known_speaker_references: [TEST_AUDIO_BASE64, TEST_AUDIO_BASE64],
},
}
: {}),
}As per coding guidelines, testing/e2e/**/*.spec.ts: every feature, bug fix, or behavior change must include E2E coverage using Playwright + aimock.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const isDiarization = feature === 'transcription-diarization' | |
| const transcriptionInput: TranscriptionGenerateInput = { | |
| audio: TEST_AUDIO_BASE64, | |
| language: 'en', | |
| ...(isDiarization | |
| ? { | |
| modelOptions: { | |
| response_format: 'diarized_json', | |
| chunking_strategy: 'auto', | |
| known_speaker_names: ['agent', 'customer'], | |
| known_speaker_references: [TEST_AUDIO_BASE64, TEST_AUDIO_BASE64], | |
| }, | |
| } | |
| : {}), | |
| } | |
| const isDiarization = feature === 'transcription-diarization' | |
| const transcriptionInput: TranscriptionGenerateInput = { | |
| audio: TEST_AUDIO_BASE64, | |
| language: 'en', | |
| ...(isDiarization | |
| ? { | |
| modelOptions: { | |
| response_format: 'diarized_json', | |
| known_speaker_names: ['agent', 'customer'], | |
| known_speaker_references: [TEST_AUDIO_BASE64, TEST_AUDIO_BASE64], | |
| }, | |
| } | |
| : {}), | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@testing/e2e/src/components/TranscriptionUI.tsx` around lines 35 - 49, The
default diarization E2E payload currently hardcodes chunking_strategy
('chunking_strategy: "auto"') inside transcriptionInput -> modelOptions when
isDiarization is true; remove the chunking_strategy field from
transcriptionInput so the test covers the omitted-field/defaulting branch (leave
known_speaker_names and known_speaker_references as-is), and if you still want
explicit-option coverage add a separate test that constructs a
transcriptionInput with modelOptions.chunking_strategy = 'auto' to exercise the
passthrough path; update references to isDiarization, transcriptionInput, and
modelOptions accordingly.
Source: Coding guidelines
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@testing/e2e/src/lib/media-providers.ts`:
- Around line 42-50: The code currently lets an internal flag
modelOptions.diarize flow into the OpenAI SDK; update the transcription request
construction to strip the diarize property before spreading modelOptions into
the SDK call—e.g., in the OpenAI transcription adapter where the request is
built, clone modelOptions and delete or omit the diarize key (while still using
getOpenaiTranscriptionModel(...) for detection), then spread the sanitized
object (e.g., sanitizedModelOptions) into request: { model, file,
...sanitizedModelOptions } so diarize is never sent upstream.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 61c5ddd3-e650-49bd-b749-865a32276717
📒 Files selected for processing (8)
examples/ts-react-chat/src/lib/server-fns.tsexamples/ts-react-chat/src/routes/api.transcribe.tspackages/ai-openai/src/adapters/transcription.tspackages/ai/src/types.tstesting/e2e/src/lib/media-providers.tstesting/e2e/src/lib/server-functions.tstesting/e2e/src/routes/api.transcription.stream.tstesting/e2e/src/routes/api.transcription.ts
🚧 Files skipped from review as they are similar to previous changes (4)
- examples/ts-react-chat/src/routes/api.transcribe.ts
- testing/e2e/src/lib/server-functions.ts
- packages/ai-openai/src/adapters/transcription.ts
- examples/ts-react-chat/src/lib/server-fns.ts
| function getOpenaiTranscriptionModel(options: TranscriptionAdapterOptions) { | ||
| const modelOptions = options.modelOptions | ||
| const isDiarizationRequest = | ||
| options.responseFormat === 'diarized_json' || | ||
| modelOptions?.response_format === 'diarized_json' || | ||
| modelOptions?.diarize === true || | ||
| modelOptions?.chunking_strategy !== undefined || | ||
| modelOptions?.known_speaker_names !== undefined || | ||
| modelOptions?.known_speaker_references !== undefined |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "1) Find where diarize is populated in E2E request construction"
rg -n --type=ts -C2 '\bdiarize\b|response_format|chunking_strategy|known_speaker_' testing/e2e/src
echo
echo "2) Confirm modelOptions is forwarded from API routes"
rg -n --type=ts -C3 'generateTranscription\(|modelOptions,' \
testing/e2e/src/routes/api.transcription.ts \
testing/e2e/src/routes/api.transcription.stream.ts
echo
echo "3) Confirm OpenAI adapter request spread behavior"
rg -n --type=ts -C5 '\.\.\.modelOptions|TranscriptionCreateParamsNonStreaming|request\.response_format' \
packages/ai-openai/src/adapters/transcription.tsRepository: TanStack/ai
Length of output: 5925
Prevent internal modelOptions.diarize from reaching the OpenAI SDK
- Current E2E payloads don’t set
modelOptions.diarize(they useresponse_format: 'diarized_json',chunking_strategy, andknown_speaker_*), andmodelOptionsis forwarded unchanged by both transcription routes. - The OpenAI adapter still spreads
...modelOptionsinto the SDK request (request: { model, file, ...modelOptions }), so if any caller ever addsmodelOptions.diarize, it would be sent upstream as an unsupported parameter—omitdiarizebefore building the request.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@testing/e2e/src/lib/media-providers.ts` around lines 42 - 50, The code
currently lets an internal flag modelOptions.diarize flow into the OpenAI SDK;
update the transcription request construction to strip the diarize property
before spreading modelOptions into the SDK call—e.g., in the OpenAI
transcription adapter where the request is built, clone modelOptions and delete
or omit the diarize key (while still using getOpenaiTranscriptionModel(...) for
detection), then spread the sanitized object (e.g., sanitizedModelOptions) into
request: { model, file, ...sanitizedModelOptions } so diarize is never sent
upstream.
tombeckenham
left a comment
There was a problem hiding this comment.
Thanks for the thorough follow-up here — the E2E coverage across all three transports, the openai-diarize option on the example page with speaker-labeled segments, the shared TranscriptionResponseFormat extraction, and the much more complete validateDiarizationOptions all address my earlier review. The response-mode discriminant ('diarized' | 'verbose' | 'plain') + request-plan refactor is a genuine improvement. 🙏
I'm requesting changes on one architectural point, plus the small cleanups it implies.
Requested change — keep diarized_json out of the shared union
diarized_json was added to the shared, cross-provider TranscriptionResponseFormat (packages/ai/src/types.ts:1712-1718). The problem: only the OpenAI adapter reads the top-level responseFormat. The other transcription adapters drive diarization from a boolean instead and ignore responseFormat entirely:
packages/ai-elevenlabs/src/adapters/transcription.ts→modelOptions.diarize: booleanpackages/ai-grok/src/adapters/transcription.ts→modelOptions.diarize: booleanpackages/ai-fal/...→ no diarization support
So after this change, responseFormat: 'diarized_json' type-checks for every provider but is silently ignored by three of them. diarized_json isn't a portable format — it's OpenAI's wire-format literal for one model (gpt-4o-transcribe-diarize). The shared union should only advertise formats a generic caller can actually request.
Please keep diarized_json in OpenAITranscriptionResponseFormat only and revert the shared TranscriptionResponseFormat to the portable set ('json' | 'text' | 'srt' | 'verbose_json' | 'vtt'). Diarization on OpenAI is already driven through modelOptions.response_format: 'diarized_json' (that's what the example and E2E use), so nothing in the feature path is lost.
Two cleanups fall out of this and should land together:
OpenAITranscriptionResponseFormat = TranscriptionResponseFormat | 'diarized_json'(packages/ai-openai/src/audio/transcription-provider-options.ts:4-6) is currently redundant —diarized_jsonis already in the shared union, so it collapses to exactlyTranscriptionResponseFormat. Oncediarized_jsonmoves out, this alias becomes a real extension again. ✅const topLevelResponseFormat = responseFormat as OpenAITranscriptionResponseFormat | undefined(packages/ai-openai/src/adapters/transcription.ts:253-255) is a no-op cast today and becomes a safe widening after the change — drop theasand assign directly.
Direction we want — a portable diarize on/off
For where this should go cross-provider: the portable concept is diarization on/off, not a format string. And the output is already normalized — TranscriptionSegment.speaker?: string exists in the shared type and ElevenLabs already populates it. So the clean cross-provider surface is a top-level diarize?: boolean on TranscriptionOptions that each adapter maps to its own mechanism (OpenAI → diarize model + diarized_json; ElevenLabs/Grok → diarize: true), with results unified via segment.speaker.
I don't want to balloon this PR's scope. Either is fine with me:
- add a top-level
diarize?: booleanwired for OpenAI only in this PR (ElevenLabs/Grok wiring as a follow-up), or - keep this PR OpenAI-scoped via
modelOptionsand I'll open a follow-up issue for the portable flag.
Let me know which you'd prefer. (Note for whoever does the portable work: Grok currently types its speaker as number while the shared segment.speaker is string — that needs reconciling.)
Minor
- Validation gap (
packages/ai-openai/src/adapters/transcription.ts:458-467): the matching-length check only fires when bothknown_speaker_namesandknown_speaker_referencesare present. A lone array (one provided without the other) passes local validation and defers to a late opaque 400 — the exact thing this validator exists to prevent. Please reject early when exactly one of the two is provided. - Dead branch (
testing/e2e/src/lib/media-providers.ts:47):modelOptions?.diarize === trueis never set by any caller (the real switch isresponse_format: 'diarized_json'), anddiarizeisn't a valid OpenAI param. Please drop that clause so model selection is driven only by real fields.
Blocking: the shared-union scoping + the two cleanups it implies. The rest is quick polish. Thanks again!
🤖 Generated with Claude Code
🎯 Changes
This change adds diarization support for OpenAI's gpt-4o-transcribe-diarize model, based on https://developers.openai.com/api/docs/guides/speech-to-text?lang=javascript
✅ Checklist
pnpm run test:pr.🚀 Release Impact
Summary by CodeRabbit
New Features
Documentation